[ARM32] Eliminate red zone usage in runtime stubs#129398
Conversation
On ARM32 Linux, the area below SP is not guaranteed to be preserved across signal delivery. Replace red zone reads/writes with explicit stack adjustments (push/pop) in: - NativeAOT interop thunks (ldr pc dispatch, no stack intermediate) - NativeAOT UniversalTransition (caller pushes args onto stack) - NativeAOT interface dispatch stubs (PROLOG_STACK_ALLOC instead of sub-SP stores) - CoreCLR VTableCallStub (pre-indexed str/post-indexed ldr) Guarded by FEATURE_AVOID_RED_ZONE, enabled for ARM32 non-Windows targets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @agocke, @dotnet/ilc-contrib |
Windows ARM32 is no longer supported. |
Windows ARM32 is no longer supported, so every ARM32 target is Linux. The red zone avoidance is always needed — remove the preprocessor guard and delete the old red zone code paths entirely. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7cc9b73 to
59bc77c
Compare
The ldr pc dispatch needs only 12 bytes (mov r12 + ldr pc), no padding required. This increases thunks per page from 204 to 341 (67% more). Also shorten verbose comments per review feedback. Co-authored-by: Jan Kotas <jkotas@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- StubDispatch: use PROLOG_PUSH/EPILOG_POP {r1,r2} instead of manual
STACK_ALLOC + str/ldr
- UniversalTransition: replace interleaved ldr/push dance with a single
PROLOG_PUSH {r0-r3} then load caller args from known stack offsets
- Clean up stale red zone comments
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
59bc77c to
87288df
Compare
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
|
/azp run runtime-nativeaot-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Segfaults in many linux arm32 NAOT tests Could you please take a look? |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@MichalStrehovsky PTLA |
There was a problem hiding this comment.
Pull request overview
This PR updates several ARM32 stubs and NativeAOT transitions to avoid writing below sp (red zone) by switching to explicit stack adjustments (push/pop / stack alloc), and updates related thunk/transition conventions accordingly.
Changes:
- CoreCLR ARM32 interface/vtable-related stubs: replace red-zone saves/restores with stack-based sequences.
- NativeAOT ARM32 thunk and interop paths: shrink thunk stubs by branching via
ldr pcwhile preservingr12as the thunk data pointer, and adjustRhCommonStubaccordingly. - NativeAOT ARM32 universal transition: change extra-argument passing to caller-pushed stack args and update the corresponding stack frame layout and unwind helper logic.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/vm/arm/virtualcallstubcpu.hpp | Updates VTableCall stub encoding/size logic to use push/pop-style stack ops instead of red zone. |
| src/coreclr/runtime/arm/StubDispatch.S | Replaces red-zone register spills in cached interface dispatch stubs and adjusts slow-path arg passing to universal transition. |
| src/coreclr/nativeaot/Runtime/ThunksMapping.cpp | Changes ARM thunk stub shape and size to branch via ldr pc and keep r12 as data pointer. |
| src/coreclr/nativeaot/Runtime/StackFrameIterator.cpp | Updates ARM universal transition stack frame layout to account for caller-pushed extra args. |
| src/coreclr/nativeaot/Runtime/EHHelpers.cpp | Adjusts ARM unwind helper to compensate for new interface dispatch stack usage on null-this AV. |
| src/coreclr/nativeaot/Runtime/arm/UniversalTransition.S | Switches universal transition extra args from red zone to caller-pushed stack args and updates prolog/epilog accordingly. |
| src/coreclr/nativeaot/Runtime/arm/InteropThunksHelpers.S | Updates RhCommonStub to consume r12 directly (no red-zone load). |
| src/coreclr/nativeaot/Runtime/arm/DispatchResolve.S | Replaces red-zone spills with stack pushes and updates slow-path argument setup for universal transition. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
/azp run runtime-nativeaot-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run runtime-nativeaot-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
All Arm32 test failures are known |
Co-authored-by: Michal Strehovský <MichalStrehovsky@users.noreply.github.com>
…USH/EPILOG_POP
- DispatchResolve.S: use PROLOG_PUSH/EPILOG_POP for {r3,r4,r5,r6,r8}, add
.save {r1,r2} at Hashtable entry, drop lr from push list
- UniversalTransition.S: rewrite prolog to preserve original frame layout
(push r0-r1, capture caller args, store r2-r3 into caller slots)
- StackFrameIterator.cpp: revert to original UniversalTransitionStackFrame
layout (no m_callerPushedArgs)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run runtime-nativeaot-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run runtime-nativeaot-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Thanks @jkotas and @MichalStrehovsky for the thorough review and guidance! The prolog trick to preserve the original frame layout was particularly elegant — avoiding the TransitionBlock.cs changes made this much cleaner. Appreciated the push toward idiomatic ARM patterns (PROLOG_PUSH/EPILOG_POP, r8 as scratch) and the thunk size reduction as a bonus. Learned a lot from this one. |
|
@cshung Thank you for fixing this! It is likely source of some of the intermittent arm32 crashes. How did you find the problem? |
I am trying to get it to run on low-end devices without virtual memory support. On those platforms, using red zone will fail pretty easily. |
On ARM32 Linux, the area below SP is not guaranteed to be preserved across signal delivery. The runtime previously used the red zone (writing below SP without adjusting it) in several stubs, which can cause silent corruption or crashes when a signal is delivered at the wrong moment.
This PR eliminates all red zone usage in ARM32 runtime stubs by replacing sub-SP reads/writes with explicit stack adjustments (push/pop):
ThunksMapping.cpp) — useldr pcdispatch directly from r12, no stack intermediate. This also shrinks THUNK_SIZE from 20 to 12 bytes.DispatchResolve.S,StubDispatch.S) —PROLOG_PUSH/EPILOG_POPinstead of red zone stores.